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THIRD  ANNUAL  REPORT 


Variablos  Rolatod  to  Acciaracy  in  Intorporsonal  Porcoption 
Victor  B.  Gllno  and  James  M,  Richards,  Jr. 

University  of  Utah 

INTRODUCTION 

For  tho  past  threo  years  research  has  been  conducted  in  the  aroa 
of  person  perception  at  the  University  of  Utah  vd.th  grant  support  from 
tho  Group  Psychology  Brr.nch  of  the  Office  of  Naval  Research.  Tho  basic 
approach  has  been  to  use  sotmd-color  movies  of  people  in  interview 
situations  as  stimulus  nitorial  for  judges  in  making  evaluations  and 
judgments  about  other  people.  In  tho  three  years  that  this  project  has 
been  under  way  the  number  of  films  used  has  been  reduced  from  24  to  6, 
using  methods  approximating  standard  item  analysis,  Hotorogonoous  groups 
of  judges  have  viewed  those  sound  movies  and  made  a  variety  of  predictions 
about  tho  real  life,  vorbaT ,  and  test  behavior  of  tho  subjects  seen  in 
the  films.  One  of  the  major  early  findings  was  that  there  exists  a 
certain  amount  of  generality  in  ability  to  judge  accurately  across 
jiadging  instruments,  as  well  as  across  films.  This  finding  has  held  up 
in  some  seven  independent  experiments  reported  in  detail  in  the  first 
two  Annual  Reports  (Cline  and  Richards,  1958,  1959)* 

Cronbach  (1955)  has,  in  the  past,  strongly  criticized  global 
approaches  to  obtaining  accuracy  of  judgment  scores.  He  feels  that 
global  scores  are  virtually  meaningless  because  they  lump  together  all 
sorts  of  response  sets,  and  to  meet  this  problem  he  has  proposed  several 
complex  methods  of  securing  so-callcd  ” component  scores."  In  previous 
research,  analysis  of  some  of  those  component  scores  suggested  by 
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Cronbach  lod  to  the  conclusion  th^it  oven  though  there  is  a  modest  degree 
of  gonorality  in  judging  ability  there  aro  also  at  least  two  major  types 
of  judging  accuracy  ability,  Tho  first  major  type  of  ability  was 
Sensitivity  to  the  Gonoralizod  Other"  (or,  in  Cronbach*s  terminology, 
"Storootypo  Accuracy"),  and  the  second  major  typo  was  "Intorporsonal 
Sensitivity"  or,  in  Cronbach *s  terras,  "Differential  Accuracy"  and/or 
"Differential  Elevation,"  These  results  wore  obtained  using  tho  Trait 
Rating  type  of  instrument  whore  judges  rated  the  persons  interviowod  using 
a  Likert  typo  six  category  scale,  Bronfenbrenner  and  his  associates  (1958) 
in  a  completely  independent  study  obtained  essentially  tho  samo  results. 

Howovor,  difficulties  wore  repeatedly  oncountored  in  working  with 
and  interproting  some  of  Cronbach *s  compononts  of  accuracy  scores  (using 
tho  Trait  Rating  instrument).  In  going  back  over  Cronbach *s  articlo  it 
was  found  that  ho  appeared  to  contradict  hirasclf  in  discussing  tho 
meaning  and  interpretation  of  some  of  those  compononts.  In  addition, 
the  conclusion  was  reached  that  mking  ratings  on  rather  haphazardly 
chosen  traits  is  not  a  very  meaningful  judging  task,  and  it  was  thcroforo 
decided  that  a  now  judging  instrument  was  needed  which  would  allow 
clarification  of  tho  use  and  interpretation  of  those  component  acciiracy 
scores.  Since  part  of  the  filmed  interview  which  tho  judges  saw  involved 
questions  relating  to  ono*s  religious  values  and  beliefs,  a  new  instrument 
called  the  Bcliof-Valucs  Inventory  was  dcvolopod.  On  this  instrument, 
tho  judge  is  required  to  predict  tho  responses  of  tho  person  seen  in  tho 
film  to  such  questions  as,  "Tho  idea  of  God  is  just  a  fiction,"  on  a 
five  category  Idkort  type  scale. 

This  proved  to  bo  a  much  moro  satisfactory  instrument  than  tho 
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Trait  Rating  moasuro,  and  accordingly  In  tho  past  yoar  a  majcar  study  of 
BoHof-Valuos  Invontory  component  scores  was  carried  o\rt«  This  stiady, 
which  is  reported  in  detail  later  in  this  report  (soo  page  ),  has 
suggested  some  modifications  in  Cronbach's  analytic  sohemo. 
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BRIEF  RESUME  OF  THIRD  yE/lR*S  WORK 

■|hird  year  (1959-1960)  omphasizod  roRoarch  in  the  follovd.ng  areas: 
An  intonsivo  study  of  the  analysis  of  accuracy  scores  into 
components,  using  an  analytic  scheme  similar  to,  but  with  somo 
modifications  of,  the  schorio  suggested  by  Cronbach  (1955 )• 
Analyses  of  the  data  of  an  exporiment  in  which  a  comparison  of 
the  judgments  of  individuals  versus  groups  (whore  group  members 
collaborate  in  decision  and  judgment  making)  is  mado. 

An  investigation  was  made  of  the  clinical  versus  statistical 
prediction  problem  in  toms  of  components  of  accuracy  scores. 


4*  Sponsoring 


a  group  discussion  typo  symposium  entitled  Now  Frontiers 

\  \  \  \  \ 

in  person  jPorcontion  Roi^oarch  dt  the  annuAl  meetings  of  the  \ 

\  ■■  '  \  \ 

American  Psychological  Association,  Septembor,  I960,  in  Chicago, 

IlHnois. 

The  devolopncnt  of  new  interviews  (to  be  filmed)  was  iindertakcn 
ah(r~  ts-curreiltl^  gr  c's  s . 

The  further  devolopnont  and  rofinenent  of  nc3w  judging  instruments 
was  initiated  on  the  basis  of  earlier  research  findings  with 
components  of  accuracy  scores.  TfatB  work'^aIsQ~l^~^trT»ogn^ 


i 


CHAPTER  I 


Accuracy  Components  in  Person  Perception  Scores 
and  the  Scoring  System  as  an  Artifact  in 
Investigations  of  the  Generality 
of  Judging  Ability 

The  usual  procedure  in  investigations  of  accura.cy  of  person 
perception  is  to  have  a  judge  predict  how  sorio  other  person  has  responded 
to,  or  will  respond  to,  sono  standard  instrument.  Then  the  judge’s 
predictions  are  compared  to  the  other  person’s  actual  responses,  and 
sono  neasuT'o  of  the  degree  of  agroonont  between  the  two  is  taken  as  a 
measure  of  the  judges'  acciiracy  of  person  perception, 

A  special  case  of  this  procedure  is  that  in  which  the  judges  make 
their  predictions  and  the  ’'others'*  make  their  responses  on  a  standard 
instrument  which  consists  of  a  series  of  items,  each  of  which  has  several 
possible  responses  differing  along  a  scale  to  which  numerical  values 
can  bo  applied.  An  example  of  such  a  scale  would  bo  a  scries  of 
adjectives  each  of  which  was  rated  by  the  "other"  for  the  degree  to 
which  it  was  descriptive  of  himself  on  a  six  category  scale  ranging  from 
"Very  like"  to  "Very  Unliko.*’  and  vrith  corresponding  nunorical  scores 
assigned  ranging  from  1  for  '’Very  like"  to  ^  for  "Very  thliko."  Tho 
ncasuro  of  accoaracy  of  p)orson  perception  used  with  this  typo  of 
instrument  is  based  on  the  numerical  difforonco  botwoon  tho  judge’s 
predictions  and  tho  "other* s'*  actual  responses*  Tho  usual  procoduro  in 
Invostigations  of  this  typo  is  to  square  ttds  difference  score  for  each 
item  and  then  overage  across  Itcias*  This  average  value  is  commonly 
called  the  statistic.**  (Cronbach  and  Glaser,  1953) 
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Cronbach  (1955)  has  strongly  criticized  treatment  of  accuracy  of 
person  perception  in  this  global  fashion,  and  has  proposed  a  more 
analytic  treatment  based  on  mathematically  independent  components  of 
the  statistic.  Cronbach’s  analysis  is  most  clearly  pertinent  to 
those  situations  in  which  judges  predict  the  responses  of  a  series  of 
others,”  rather  than  a  single  “other,”  In  this  case,  Cronbach  i:eats 
these  judgments  as  a  matrix  X  in  which  the  ''others”  whose  ce spouses  are 
predicted  compose  the  columns  of  the  matrix,  and  the  items  on  which 
predictions  are  made  compose  the  rows  of  the  matrix.  Thus,  each  element, 
Xij,  of  this  matrix  indicates  that  "othor”  number  j  was  predicted  to 
have  made  the  response  with  numbcrical  value  ^  on  item  number  i.  There 
is  one  such  matrix  for  each  judge.  In  order  to  compute  the  measure  of 
accuracy  of  prediction  and  its  various  components,  each  such  matrix  is 
compared  to  a  cirlterion  matrix  C,  in  which  each  element  Oj[j  is  the 
nmborical  value  of  the  actual  response  of  “other”  on  item 
Cronbach *s  schomo,  is  broken  down  into  four  basic  components  based  on 
discrepancies  between  numerical  values  of  predicted  responses  and 
numorical  values  of  actual  responses.  Throe  of  those  four  compononts  can 
be  further  broken  down  into  a  correlation  term  and  a  variance  term. 

These  components  are: 

(l)  Elevation  -  This  is  a  measure  of  the  difference  between  the 
moan  of  the  numerical  values  in  the  judgment  matrix  taken  over  all 
"others”  and  all  items  and  the  corresponding  moan  of  the  numerical 
values  in  the  cidltorion  matrix,  Cronbach  states  that  this  component 
is  primarily  a  measure  of  the  way  in  which  each  judge  uses  the  rating 
scale  rather  than  a  measure  of  judging  ability, 

{2)  Differential  Elevation  -  This  component  is  a  measure  of  the 
difference  between  the  means  of  the  columns  of  tho  judgment  matrix  and 
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tho  means  of  the  columns  of  tho  criterion  matrix,  with  the  contribution 
of  the  elevation  component  ollninatod.  It  is  thus  a  measure  of  tho 
ability  to  predict  differences  between  “others”  tp.kon  across  items* 

This  component  can  bo  further  broken  down  into  (a)  a  correlation  term 
which  measures  tho  extent  to  which  the  Judge  arranges  “others"  in  the 
some  order  in  which  they  are  ordered  in  the  criterion  matrix,  and  (b) 
a  variance  torn  which  measures  tho  extent  to  which  Judged  differences 
between  "others"  are  lr.rgc  or  small. 

(3)  Stereotype  Accuracy  -  This  component  is  a  measure  of  the 
difference  between  tho  means  of  the  rows  (or  items)  of  the  Judgment 
matrix  and  the  corresponding  means  of  the  criterion  matrix,  again,  with 
tho  contribution  of  the  olcve.tion  component  removed.  This  component 
also  can  be  broken  down  into  (a)  a  correlation  term  and  (b)  a  variance 
tom,  and  is  a  measure  of  how  accurately  tho  Judge  predicts  the  responses 
of  tho  typical  "other"  and/or  poople-in-gcnoral  to  the  items. 

(4)  Differential  Accuracy  -  This  component  is  a  moasiire  of 
difforcnco  between  the  scores  for  "others"  on  individinl  items  in  tho 
Judgment  matrix  and  the  corresponding  scores  in  tho  criterion  matrix, 
where  in  each  case  an  individtial  "other’s"  score  is  taken  as  a  deviation 
both  from  his  own  mean  and  from  the  item  mean.  This  component  can  also 
bo  broken  down  into  a  correlation  term  and  a  variance  tom,  and  is 
averaged  across  items  to  obtain  an  overall  measure.  Cronbach  considers 
this  component  and  particularly  its  correlational  term  tho  most 
appropriate  moasurc  of  what  is  ordinarily  meant  by  accuracy  of  person 
perception. 

Recent  research  by  Bronfonbronner  and  his  associates  (1958)  and  by 
tho  authors  (Clino  and  Richards,  1958)  has  suggested  that  such  an 
analytic  scheme  is  a  promising  approach  in  studios  of  person  perception. 
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Bronfonbronnor,  using  a  similar,  but  not  identical,  approach,  found  two 
rolativoly  indopendont  nspocts  of  ability  to  prodict  tho  rosponsos  of 
others;  first,  ^Sensitivity  to  tho  gonoralizod  other**  (a  measure  similar 
to  Cronbach*s  Sterootypo  Accuracy)  which  is  tho  ability  to  predict  tho 
moan  responses  of  sovoral  people  on  Individual  items,  and  ** Individual 
sensitivity.'*  (a  noastiro  similar  to  Cronbf?.ch*s  Difforont^ al  Accuracy) 
which  is  the  ability  to  order  correctly  individmls  on  items.  In  the 
research  of  the  authors,  Cronbach*s  scheme,  in  its  difference  score  form, 
\^as  used  specifically,  Tho  judging  instrument  used  in  this  study  was 
tho  Trait  Rating  test,  a  list  of  twenty-five  adjectives  which  judges 
rated  on  their  degree  of  similarity  to  ten  standard  "others’*  presented 
by  moans  of  a  sound  color  movie,  Tho  results  of  this  experiment 
suggested  that  over-all  judging  ability  consisted  mainly  of  two  indepen¬ 
dent  parts;  "Stereotype  Accuracy**  and  "Differential  Accuracy"  (using 
Cronbach*s  terminology), 

Tho  results  of  those  studios  appear  at  first  glance  to  offer 
striking  confirmation  of  each  other,  particularly  so  since  quite 
different  procedures  wore  used;  and  in  combination  to  confirm  tho 
importance  of  treating  person  perception  scores  in  the  way  suggested  by 
Cronbach,  In  spite  of  the  promise  of  those  rosiilts,  however,  in 
subsequent  research  tho  authors  encountered  persistent  difficulties  in 
tho  interpretation  of  these  components,  particularly  when  an  attempt 
was  made  to  break  down  Difforential  Elevation >  Sterootypo  Accuracy ^  and 
Differential  Accuracy  into  a  correlation  tom  and  a  variance  tom.  This 
difficulty  came,  through  tho  course  of  much  work,  to  be  focused  on  tho 
Differential  Elevation  component,  .and  its  correlation  tern,  largely 
because  Cronbach  (1955)  appeared  to  contradict  hinsolf  about  tho 
interpretation  of  those  measures  on  tho  3?.no  page,  first  stating  that 
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these  measures  reflect  primarily  whether  or  not  the  judge  interprets 
the  words  defining  the  scale  in  the  same  way  as  the  '* others**  do,  and 
that  therefore  they  appear  “relatively  unfruitful'*  as  a  source  of 
information  on  his  perception  of  “others,”  and  second,  stating  that 
these  scores  are  measures  of  the  judge  *s  sensitivity  to  individual 
differences. 

In  attempting  to  clear  up  these  difficulties  of  interpretation, 

the  authors  felt  they  needed  an  instrument  in  which  the  predictions 

themselves  could  bo  intorpretod  more  easily  than  they  could  using  a 

typical  Trait  Rating  tost.  They  therefore  developed  a  new  judging 

instrument,  the  BoIiof^iValuos  Inventory,  This  judging  instrument 

required  the  judge  to  predict  ** others”  responses  to  twelve  lAkort  typo 

items  dealing  with  religious  beliefs  and  values,  A  sample  item  is: 

When  in  doubt,  I  have  found  it  best  to  stop  and  ask  God  for 
guidance • 

A ,  strongly  agree 

B,  agree 

C,  neither  agree  nor  disagree 

D,  disagree 

E,  strongly  disagree 

In  the  filmed  interview  the  "other”  had  boon  asked  direct  questions 
about  his  attitudes  toward  religion. 

Subsequent  rosoaroh  with  this  instrument  indicated  that  the 
difficulties  with  this  typo  of  analytical  treatment  arise  largely 
because  the  scoring  system  has  a  differential  effect  upon  the  components 
involved  in  a  Cronbach  typo  analysis.  With  items  of  the  typo  included 
in  the  Bollef-Valuos  Inventory,  there  are  two  possible  scoring  methods, 
Tho  first  of  those  is  to  score  "Strongly  Agree”  as  1,  "Agree"  as  2, 
etc,,  without  regard  to  whether  or  not  on  that  particular  itom, 
"Strongly  Agroo"  is  a  pro-rcligious  answer.  Tho  second  possible 
scoring  system  is  to  score  the  most  convontional  pro-roligious  rosponso 
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as  1,  regardless  of  whether  that  answer  is  "strongly  Agree"  or 
"Strongly  Disagree."  If  the  first  of  those  scoring  systems  is  used,  the 
Store otypo  Accuracy  variance  is  large,  but  the  Difforontial  Elovation 
variance  is  made  artificially  small.  On  the  other  hand,  if  the  second 
of  these  scoring  systems  is  used,  the  Store otypo  Accuracy  variance  is 
artifically  reduced,  while  the  D5 fforontial  Slovation  variance  is 
maximized.  These  effects  are  illustrated  by  Table  I,  which  presents 
tho  responses  of  three  hypo the tiaal  persons  to  items  of  this  typo,  with 
each  item  score  presented  in  both  of  tho  two  scoring  systems.  In  Table 
I,  the  first  hypothetical  person  always  answered  with  tho  most 
conventional  religious  r.nswor*  the  second  hypothetical  person  always 
answered  with  tho  second  most  conventional  religious  answer,  and  tho 
third  hypothetical  person  always  gave  the  middle  or  neutral  response. 

In  this  table  tho  consistency  of  responding  by  each  person  is,  of 
course,  somewhat  exaggerated  to  make  t]ic  point  clear. 

In  addition  to  tho  effects  of  the  scoring  system  on  tho  variance 
of  Stereotype  Accuracy  and  Differential  Elovation.  sovoral  other  things 
are  apparent  from  Table  I.  The  first  of  those  is  that  if  those  throe 
hypothetical  persons  were  used  as  "others"  in  our  films  and  the  first 
scoring  system  were  used  (i.c.,  whore  "Strongly  Agree"  is  alwTiys 
scored  as  1,  without  regard  to  whether  this  is  in  the  religious  or 
non-religious  direction)  no  matter  what  degree  of  accuracy  a  judgo 
attained  in  predicting  their  responses  tho  Differential  Elevation 
correlation  component  could  take  no  other  value  than  ,00,  If  one  then 
tried  to  relate  the  Differential  Elovation  correlation  term  to  other 
measures  of  judging  values,  it  is  obvious  that  there  could  be  no 
relationship,  and  one  night  erroneously  conclude  that  there  is  no 
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gonorality  of  jxdging  ability.  Use  of  the  second  scoring  system  (i.e,, 
where  the  most  pro-religious  rosponso  is  always  scored  1  regardless  of 
whether  it  is  ‘’Strongly  Agree”  or  "Strongly  Disagree"),  however,  has  a 
similar  effect  on  the  Steroot-^/rxD  Accuracy  correlation  and  again  could 
load  to  a  false  conclusion  that  there  is  no  generality  of  judging 
ability.  It  will  also  be  soon  from  Table  I  that  if  the  first  scoring 
system  were  used  in  a  study  of  judging  ability.  Differential  Elovation 
and  also  Elovation  would  reflect  primarily  the  extent  to  which  judges 
interpreted  items  in  the  same  way  as  the  "others,"  but  that  if  the  second 
scoring  system  wore  used  Differential  Elovation  would  bo  a  mcasuro  of 
the  judges’  "sensitivity  to  individual  differences"  in  overall 
"•""  Llgiosity,  and  Elovation  would  be  a  measure  of  the  judged  average 
religiosity  of  the  group  of  "others."  Thus  the  apparent  paradox  in 
Cronbach’s  fomnikition  is  resolved.  All  of  this,  taken  together, 
strongly  suggests  that  in  investigation  of  .accuracy  of  person  porception, 
and  jxirticularly  of  its  gonorality,  neither  of  those  scoring  systems 
is  by  itself  satisfactory,  but  rather  that  the  first  scoring  system 
should  bo  used  in  computing  Sterootyne  Accuracy  and  its  components 
and  the  second  scoring  system  should  bo  used  in  computing  Differential 
Elovation  and  its  components.  It  should  be  emphasized  tliat,  in  the 
hypothetical  example,  the  items  differed  rrrcatly  in  their  average  degree 
of  ondorsomont  and  the  persons  differed  groaoly  in  their  over-all  degree 
of  religiosity.  In  studios  of  accuracy  of  interpersonal  perception, 
both  of  these  would  bo  important  and  yet  either  one  or  the  other  would 
inevitably  bo  artificially  eliminated  if  either  Bcoring  system  alone 
was  used. 

Those  two  scoring  systems  also  have  another  effect  which  is  not 
readily  apparent  in  the  hypothetical  example,  since  it  eliminates  all 
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tho  pereons-by-itoms  intoraction  that  would  occur  in  a  roal  problom. 

This  offoct  is  that  tho  Differential  Accuracy  component  and  its 
correlation  and  varianco  constituonts  will  ell  tako  on  difforont  valuos 
doponding  on  which  scoring  system  is  used,  and  tho  authors  aro  aware  of 
no  criterion  which  would  indicate  in  a  real  problem  which  of  those  values 
arc  the  most  appi'oprlato  measures  of  judging  abilicy.  This  led  tho 
authors  to  the  conclusion  that  none  of  tho  valuos  of  Plfferontial 
Accxiracy  and  its  constituonts  arc  particularly  good  measures  of  judging 
ability,  and  to  drop  it  from  further  consideration  in  their  research. 

They  have  been  replaced  with  a  new  index  of  judging  ability  which  the 
authors  have  chosen  to  call  Intorporsonal  Accuracy.  There  is  no 
difference  score  fern  of  this  noasuro;  it  consists  only  of  a  correlation 
tom  and  a  varianco  term.  The  correlation  tom  is  computed  by 
determining  the  correlation  between  each  judge’s  predicted  values. and 
tho  corresponding  actual  responses  by  "others”  on  individual  items  and 
then  averaging  across  Itoms  (without  converting  those  scores  in  toms  • 
of  their  discrepancy  from  item  and  person  moans  as  is  tho  case  with 
Difforontial  Accuracy) ,  Similarly,  the  Interpersonal  Accuracy 
varianco  tom  involves  the  computation  of  tho  varianco  of  each  judge’s 
predictions  on  individual  items,  averaged  across  items.  With  regard  to 
how  this  indox  fits  into  Cronbach’s  scheme,  tho  authors  arc  of  tho 
opinion  that  It  is  a  linear  combination  of  Difforontial  Elevation  and 
Difforontial  Accuracy.  It  offers  tho  strong  advantage  over  other 
measures  that  it  is  invariant  under  changes  of  scoring  system, 

Thoro  oro  several  additional  considerations  in  the  intorprotation 
of  this  hypothetical  cxomplo,  Tho  first  of  these  is  that  thoro  is  a 
strong  general  "roligiosity”  factor  underlying  tho  questions.  In 
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Cronbach's  scheme,  this  general  factor  should  be  tapped  the 
Differential  Elevation  conponont,  and  this  does  occur  under  the  scoring 
system  in  which  the  conventioml  religious  answer  is  always  scored  1* 

It  might  bo  objected,  there fore,  that  the  scoring  system  in  which 
"Strongly  Agree"  is  always  scored  1  "randomizes"  this  general  factor. 
This  effect  of  this  scoring  system  is,  however,  exactly  the  point  of 
tho  hypothetical  example;  and  it  should  bo  emphasized  again  that  the 
scoring  system  in  which  "Strongly  Agree"  is  always  scored  1  is  tho  only 
scoring  system  which  permits  the  real  difforeneeB  in  tho  average  dogroe 
of  endorsoment  of  tho  items  (Storootyge  Accuracy)  to  appear.  It  should 
also  be  noted  that  tho  nature  cf  tho  quostlons  presented  in  the 
hypothetical  example  was  intentionally  made  such  as  to  emphasize  the 
inappropriatonoss  cf  this  scoring  system  for  the  Differential  Elevation 
component.  In  a  loss  extrono,  more  realistic  case,  this  inappropriate- 
noss  would  be  loss  clear,  Tho  fact  tliat  thcro  is  a  strong  general 
factor  does  contribute  greatly  to  the  clarity  of  interpretation  of  tho 
Interpersonal  Accuracy  component,  and  tho  present  authors  aro  in 
agrocnont  with  Cronbach^s  position  that  several  factorially  pure  sots 
of  items  analyzed  separately  arc  preferable  to  one  factorially  complex 
sot  cf  items  treated  in  a  global  fashion. 

If  the  argument  of  the  authors  is  accepted  to  this  point,  it  is 
still  an  open  question  whether  the  considerations  outlined  have  any 
practical  effect  on  Investigations  of  accuracy  of  person  perception, 

A  study  providing  some  information  with  regard  to  this  point  has  been 
conducted.  In  this  experiment  46  undergraduates,  both  male  and  female, 
at  the  University  of  Utah,  predicted  the  responses  of  six  standard 
others,  presented  through  the  filmed  interview  procedure,  on  the 
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Belle f-.Valu9s  Inventory.  Details  of  the  Gxp<'3rimental  procedure  of 
using  these  filmed  intorviews  are  presented  olsowhoro.  (Cline  and 
Richards,  I960  A)  Using  a  program  developed  for  the  IBM  650  Computer, 
the  predictions  of  these  judges  were  scored  twice  against  the  criterion, 
once  with  each  of  the  two  scoring  systems  discussed  above,  and  the 
various  judgment  scores  inter correlated.  When  this  program  is  used  all 
correlation  terms  arc*  expressed  in  terms  of  Pisher*s  Z,  Results  of  both 
of  those  analyses  are  presented  in  Table  II,  In  Table  II,  correlations 
above  the  diagonal  were  obtained  when  “Strongly  Agree”  was  always 
scored  1  regardless  of  whether  or  not  it  represented  a  pro-religious 
answer,  and  correlations  below  the  diagonal  were  obtained  when  the  most 
conventional  pro-religious  answer  was  always  scored  1,  On  the  basis  of 
cither  of  those  two  groups  of  correlations  alone,  one  would  have  to 
conclude  that. there  is  no  consistent  pattern  of  generality  in  judging 
ability,  particularly  reflected  in  the  three  correlation  measures, 
but  rather  an  appearance  of  two  relatively  independent  factors  measured 
respectively  by  the  Differential  Elevation  correlation  term  and  the 
Storootype  Accuracy  correlation  term,  thus  confirming  the  previous 
results  of  Bronfonbrenner  and  his  associates  (1958)  and  Cline  and 
Richards  (i960  A), 

A  further  analysis  of  these  data  was  made,  however,  in  which 
judgment  scores  were  inter correlated  across  scoring  systems  in  such  a 
way  that  each  component  is  scored  most  appropriately,  Moro  specifically. 
Stereotype  Accuracy  and  its  correlation  and  variance  terms  wore  computed 
using  the  scoring  system  where  “Strongly  Agree”  was  always  scored  1,  and 
Differential  Elevation  and  its  components  and  Elevation  wore  computed 
using  the  scoring  system  whore  the  most  pro-religious  answer  was  always 
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scored  1,  Results  arc  presented  in  Table  III,  In  this  table,  there  is 
a  consistent  pattern  of  a  significant  degree  of  generality  acorss  the 
correlation  terms  of  all  components,  thus  suggesting  that  judging  ability 
is,  to  sorao  degree,  a  general  trait. 

These  results,  in  the  opinion  of  the  authors,  clearly  indicate 
that  the  scoring  system  may  be  an  important  artifact  in  investigations 
of  the  generality  question  when  using  components  of  accuracy  scores, 
and  therefore  strongly  supports  the  argument  advanced  in  the  hypothetical 
exEimple  discussed  earlier.  It  is  still  most  important  to  avoid  over- 
generalization  from  these  results.  It  is  still  possible,  and  even 
probable,  that  when  using  other  judging  instruments,  other  judges, ’or 
other  persons  to  be'  judged,  accuracy  of  *  stereotype ‘and  accuracy  of 
judgments  of  individual  differences  may  prove  to  be  really  independent.- 
•'Future- investigators' should,  however,  on  the  basis  of .  the  results  of  .. 
this  study,  guard  against  a  false  conclusion  that  the  two  main  types  of  . 
accuracy  are  independent  whbn  that  independence  represents  nothing  more 
than  scoring  system  artifact.'  The  results  also  suggest  that  Inter¬ 
personal  Accuracy  is  the  most  appropriate  measure  of  the  ability  to 
judge  accurately  individual  difforoncos,.  at  least  in  those  cases  whore 
a  strong  general  factor  is  present  in  the  items  on  which  the  ratings  are 
made . 
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CHr\PTEa  II 


A  Comparison  of  Individmls  v©  Groups  in  Judging  Porsonality 

As  a  practical  necessity  men  arc  continually  roquirod  to 
subjectively  judge,  assess,  and  evalmto  their  associates. “  Proquontly 
in  the  military  or  in  industry  tMs  is  a  preroquisite  in  initial 
employment,  promotion,  otc.  There  have  boon  various  approachos  to  tho 
quantification  of  subjective  judgments  of  which  perhaps  tho  most  common 
have  been  rating  procedures.  Since  this  typo  of  judgment  and  tho 
decisions  or  courses  of  action  which  results  therefrom -havo  so  many  far 
reaching  implications,  any  research  which  night  further,  contributo  to 
our  knowlodgq  in  this  area  should  be  :6f  considorablo  intrinsic  .  '  '  ’ 

importance..  Tho  purpose  of  the  present  study  was  to  . (l)  dotcrTalno.  *- 
whether  individuals  or  groups  are  more  iikoly  to  bo  accurate  in  making 
social  judgments  (i.o,  ‘'predictions’*  of  the  behavior  and  personality  of 
other  individuals),  and  (2)  at  the  &amc  tine  conparq  different  typos  of 
group  judgment.  These  judgments  were  made  on  instruments  similar  to 
two  different  kinds  of  mting  scales  commonly  used  in  applied  settings. 

The  rationale  of  this  oxporinont  grew  out  of  the  rccont  survey  of 
studios  comparing  group  pcrfonnanco  and  individual  performance  iiado  by 
Lorge,  Fox,  Davltz,  and  3i*onnor  (1958).  Tho  general  conclusion  of  this 
survey  was  that  a  group,  on  almost  any  task,  will  perform  bettor  than  a 
typical  Individual,  but  not  nocossarily  bettor  than  a  superior 
individual  on  tho  task  in  question.  This  finding  is  true  whethor  tho 
"group  porformnee”  is  made  by  a  gonuino  group  or  is  noroly  a  statistical 
combination  of  sovojnl  independent  individual  porfomancoa.  An 
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unresolved  question  is  the  degree  to  which  these  findings  can  be 
attributed  to  a  reduction  in  the  variability  of  the  group  performance. 

The  trend  of  the  studies  cited  in  this  survey  suggested  the  hypo¬ 
theses  to  be  tested  in  this  experiment.  These  hypotheses  arei 

1,  The  accuracy  of  predictions  (about  the  behavior  of  other 
persons)  made  by  a  group  of  persons  arriving  at  a  consensus  prediction 
through  group. discussion  will  be  significantly  greater  than  the  average 
accuracy  of  the  ;ju:odictions  made  by  the  individuals  composing  the  group. 
^The  average  accuracy  of  the  predictions  made  by  the  individual  composing 
tho  group  will' also  bo  significantly  less  than  tho  accuracy  of  an 
'•‘artificial^  groupV',  (i'.o.  a  single  prediction  derived  through  a 
>tatisti^r  combination  of  thoir  individual  predictions)  and  also  less 
tha:n  tho  acctira'cy  of  .prediction  bf  the  best  individual  among  tho 
•individmls'  comppsing  the  group.  •  *  •  • 

/  ;A  /sdcondary  question ’has\ to  do  .with  tho  '  proscnco  or- absence  of ‘a 
consistent  pattern  of  supicriority  in  accuracy  among  .predictions,  made  by 
best  individual  judges,  consensus  groups,  and  “artificial  groups.” 

■  ■  ■  Method 

The  subjects  wore  186  students,  both  malo  and  female,  in  the 
introductory  psychology  classes  at  tho  University  of  Utah  in  tho  Fall  of 
1959.  The  proceduro  involved  tho  presentation  of  six  filmed  interviews 
or  “standard  others.”  Those  wero  photographed  in  sound  and  color,  and 
wore  conducted  by  an  actor,  a  member  of  tho  University  Theatre  staff, 
who  asked  a  fairly  standard  series  of  questions  (to  insure  equivalence 
over  interviews)  probing  tho  following  areas:  (a)  personal  values, 

(b)  personality  strengths  and  weaknesses,  (c)  reaction  to  tho  interview, 
(d)  hobbies  and  activities,  (o)  self-conception,  and  (f)  temper. 
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After  a  filmed  interview  had  been  shovm  the  projector  would  be 
stopped  and  the  subject- judge  required  to  fill  out  paper-pencil  judging 
instruments*  Following  this  another  interview  would  be  shown  and  so 
forth.  Details  of  the  development  and  selection  of  these  films,  the 
experimental  procedures  involved,  and  certain  underlying  methodological 
and  theoretical  considerations  have  been  published  elsewhere  (Cline  & 
Richards,  1958,  I960  A), 

.  ,  In’ this  study,  two  prediction  instruments  were  used.  The  first  of 
these  was  the  Adjective  Chock  List,  which  required  the  subject  to 
determine' which  of  a  pair  .of  adjectives  the  interviewee  had  checked 
as'being  dcscidptivo  of  himself*  ' A  sample  item  is; 

■  \  '  .14*  ^  '•  (a)  re  source  ful\  '  '  ’ 

.  '  ■  -  (b)  cheerful  _ 

,  There' wero.- 20  buch' pairs  f^  of  thc  six. films  making  a  total  of 

120,  ..  The  scoro  on  thef  Adjective;  Chock  list  was  the  number  correct, 
■Thus 'the  Adjective  Chock  Ids t  is  similar  to  a  forced-choice  rating 
■pr6coduro',^7\;.  •  ■  ; 

The  .  second  instrument  used  was  the  Belief-Values  Inventory,  On 
this  instrument  the  subject  was  required  to  dctcmiino  (predict)  how  the 
interviewee  had  responded  to  a  Idkert  typo  scale  dealing  with  religious 
beliefs.  During  the  course  of  the  interview,  the  person  in  the  film 
had  been  asked  direct  questions  in  this  area,  A  sample  item  is; 

I  feel  quite  sure  God  does  not  exist, 

(1)  Strongly  agree 

(2)  Agree 

__  (3)  Neither  agree  nor  disagree 

_  (4)  Disagree 

_  (5)  Strongly  disagree 
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Thus  tho  Bollof-Valuos  Inventory  is  comparable  to  a  graphic  rating 
proccduTG , 

Thoro  were  12  such  items  for  each  film  or  intorviow.  Sovoral 
different  scores  based  on  a  recent  modification  by  Clino  and  Richards 
(see  Chapter  l)  of  an  analytic  procedure  suggested  by  Cronbach  (1955) 
wore  computed  from  judges*  responses  to  this  instimimont  using  a  program 
developed  for  tho  IBM  650  Computer,  The  first  of  those  was  a  total  score, 
which  was  based  on  the  average  of  the  squared  discrojxincios  (using  tho 
one  . to  five  point  scale)  botwoon  predicted  responses  by  each  judge  for 
each  into rviowoc,  and  actual  responses  of  bach  interviowoo.  This  is 
an  error  score,  and  in  order  to  make  these  scores  comparable  to  other 
scores  used  in  this  st;idy,  tho  scores  were  converted'  to  aocTiracv  scores 
through  d  standard  ocoro  transformation,  sotting  the  mean  equal  to  50 
•  and  •  standard  deviation  equal  to.  lOi 

The  second  two  BVI  scores  are  components  of. what  Cronbach  (1955) 
has  called  Storootypo  Accuracy.  This  moasuros  tho  degree  to  which  each 
j\3dge  predicts  how  tho  group  of  interviewees  as  a  whole  responds  to  the 
judging  instrument,  and  involves  tho.dogrcc  to  which  the  means  of  items 
(averaged  across  Intcrvicwoos)  predicted  by  each  judge  corrosponds  to 
actual  item  moans*  Tho  two  scores  used  in  this  st\idy  arc  the  (l) 
correlation  botwoon  each  judge *5  predicted  item  moans  and  obtained  item 
moans,  converted  to  a  Fisher *s  Z,  and  the  (2)  variance  of  oach  judge’s 
prodictod  means*  Cronbach  has  demonstrated  those  two  scores  to  be  tho 
two  parameters  in  Stereotype  Accuracy  when  the  criterion  is  hold  constant, 
and  they  permit  independent  evaluations  of  the  effect  of  grouping  on 
acc\iracy  and  on  variability  of  prediction  in  this  study 


Tho  last  two  scores  on  tho  BVI  are  moasuros  of  Interpersonal 


Acctgacy.  This  rc presents  the  dogroo  to  which  judges  accure.toly 
predict  the  responses  of  intorviowoos  to  individual  items,  and  involves 
mainly  the  degree  to  which  judges  correctly  order  the  intorviowoos  in 
terms  of  their  overall  degree  of  ^religiosity,”  It,  therefore,  is  the 
best  moasuro  of  the  kind  of  accuracy  that  is  the  main  concern  in  most 
institutional  rating  situations,  Intorporsonal  Accuracy,  like  Stereo¬ 
type  Accuracy,  has  two  independent  parcamoters,  a  correlation  tom 
expressed  in  toms  of  Fisher ^s  Z,  and  a  variance  term,  thus  permitting 
independent  cvt?.luation  of  accuracy  and  variability.  The  correlation 
score  is  computed  by  determining  the  correlation  between  each  judge  *s  • 
predicted  values  the  corresponding  actual  values  on  individual 
items,  converting  to  FishGr*s  Z  and  averaging  across  items.  The 
yarianco  scoro  is  computed  by  determining  the  variance  of  each  judge  *s  .. 
predicted  scores  on  Individual  items  and  averaging  across  items. 

Procedure 

The  186  subjects  in  this  experiment  were  divided  intp  62  three* 
person  group’s.  The  division  was  made  at  the  time  the  experiment  was 
conducted,  and  most  groups  consisted  of  three  persons  seated  next  to 
each  other  in  the  experimental  room.  Group  composition  in  terms  of 
sex  of  group  members  was  roughly  random.  The  subjects  saw  each  film 
and  first  completed  the  judging  instiumcnts  independently.  They  then 
joined  together  in  group  discussion  fashion  and  proceeded  to  arrive  at 
a  consensus  judgment  for  the  items  on  the  judging  instruments  without 
referring  back  to,  or  looking  at,  their  earlier  independent  judgments. 

The  "artificial  group"  judgment  was  derived  from  the  first 
individual  judgments  of  the  group  members.  Thus,  on  the  Adjective 
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Chock  Idst^  tho  “artificial  group*’  judgment  was  dotormined  on  the  basis 
of  a  ’’majority  voto“  of  tho  judges  on  oach  itom  (by  inspecting  thoir 
individual  judging  protocols).  On  the  Bollof-^Valuos  Invontor'.r.  it  was 
calculated  by  determing  the  average  of  the  values  predicted  by  tho  three 
judges  for  oach  interviewee  on  each  item.  It  is  important  to  emphasizo 
that  this  “artificial  group”  is  not  a  group  in  tho  psychological  sense, 
but  only  a  statistical  combination  of  tho  original  indepondent  judgments* 
The  “average  accuracy  of  individuals  composing  the  group”  was,  of 
course,  obtained  by  computing  tho  moan  of  tho  accuracy  scores  of  tho 
throe  individuals  who  made  up  each  group.  It  is  most  important  to  note 
that  , this  is  not  tho  same  thing  as  the  .“artificial  group”  procedure 
whore  it  was  the  actual  predictions  of  tho  three  group  momlx^rs  that  wore 
averaged  rather  than  their  accuracy  scores* 

The  “best  judge”  in  oach  group  was  selected  on  the  basis  of  his 
accuracy  scores.  In  interpreting  the  results  of  this  study,  thoroforo, 
it  is  important  to  note  that  this  soloction  was  done  on  an  aftor-thc- 
fact  basis,  thus  maximizing  accuracy  .scores  for  this  condition  by 
capitalizing  on  chance.  It  would,  thoroforo,  bo  impossiblo  for  a  “host 
judge"  selected  in  advance  to  obtain  a  higher  score  than  this,  and  such 
a  “best  judge”  would,  in  fact,  probably  score  sonowhat  lower,  since  some 
error  would  bo  involved  in  any  advance  selection.  The  bost  judges  wore 
solocted  independently  for  tho  ACL  and  tho  BVI  and  therefore  wore  not 
necessarily  tho  same  person  on  tho  two  different  instruments.  On  tho 
BVI,  however,  tho  "bost  judges”  salcctcd  on  tho  basis  of  total  score, 
wore  also  used  as  “bost  judges"  in  making  tho  comparisons  involving  tho 
6ther  scores  derived  from  this  instrument. 
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Results 

The  moans  and  standard  doviations  for  each  judgment  procedure  on 
each  judgment  score  aro  presented  in  Table  1,  In  table  1,  all  scores 
are  accuracy  scores.  Since  total  score  on  BVI  is  based  on  error  score, 
in  Table  1  this  judgment  score  is  transformed  to  a  stand£ird  score 
distribution  with  moan  »  50  and  standard  deviation  «  10, 

As  a  first  step  in  the  statistical  analysis  of  those  data,  overall 
F  tests  wore  calculated  for  each  of  the  judgment  scores  separately. 

The  results  of  this  analysis  arc  presented  in  Table  2,  No  tost  for 
homogeneity  of  variance  \ms  made  before  calculation  of  those  F  tests. 
This  procedure  was  followed  because  the  recent  work  of  Bonoau  (i960) 
strongly  suggests  that  F  is  not  significant3.y  affected  by  hotcrogoneity 
of  variance  if  the  sample  sizes  are  identical  and  relatively  largo, 

1,0,,  20.  Both  of  those  conditions  hold  in  the  prosont  study.  It  is 
also  known  that  available  tests  for  homogeneity  of  variance  aro  affected 
too  much  by  other  variables  than  that  involved  in  the  null  hypothesis 
to  justify  thv^ir  use  prior  to  an  analysis  of  variance  (Box,  1953). 

Since  all  of  the  F  tests  in  Table  2  are  significant  at  or  beyond 
the  ,01  level  of  confidence,  a  tost  for  significance  of  difference 
between  individual  moans  was  made.  This  test  was  made  using  the 
Multiple  Range  Tost  (Id,  1957,  p,  238),  which  is  the  most  appropriate 
procedure  known  to  the  experimenters  for  making  “post-oortem"  typo 
comparisons  botwocn  individual  moans  after  an  overall  F  tost  has  boon 
made.  Briefly,  the  Multiple  Range  Tost  involves  computing  a  valuo 
which  represents  how  large  the  difference  between  two  moans  must  bo  in 
order  to  be  significant  at  a  stated  level,  and  then  comparing  the 
obtained  difference  to  this  value.  Results  of  this  analysis  aro 
summarized  in  Table  3* 
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Table  1 

Means  and  Standard  Deviations  of  Judgment  Scores 


Average  of 
Individuals 
Composing 

"Best 

Group 

Artificial 

The  Group 

Judge" 

Consensus 

Group 

Adjec^ve  Check  List 

97.27 

101.66 

102.52 

103.32 

cr 

3.51 

3.91 

3.95 

4.55 

Belief  Values  Invontor;^ 
Total 

X 

43.29 

53.92 

49.47 

52.87 

cr 

8.61 

8.31 

11.16 

8.02 

Belief  Values  Inventory 
StOTOotypo  Accuracy  Z 

1.19 

1.44 

1.28 

i.a 

•  cr 

.28 

-.37 

.45 

.37 

Belief-Values  Inventory 
Stereotype  Accuracy 
Variance 

.35 

.40 

.31 

.30 

cr 

.14 

.22 

.15 

.13 

Belief-Values  Inventory 
Interpersonal  Accuracy  Z 

.90 

1.01 

1.00 

.98 

cr 

.12 

.U 

.16 

.15 

Belief-Values  Inventory 
Interpersonal  Accuracy 
Variance 

1 

1.09 

1.06 

1.06 

.91 

cr 

.23 

.28 

.31 

.25 
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Tablo  2 


Results  of  Ovorall  F  Tests 
For  Judgment  Scores 


Judgment  Score 

Between 
Variance 
d.f.  =  3 

Within 

Variance 
d.f.  =  244 

F 

P 

Adjective  Chock 

Ust  Total 

451.82 

13.82 

32.62 

.001 

Belief  Values 

Inventory  Total 

1423.02 

84.33 

16.87 

.001 

Belief  Values  Inventory 
Stereotype  Accniracy  Z 

.8633 

.3432 

6.03 

.001 

Belief  Values  Inventory 
Stereotype  Accurc.cy 
Variance 

.1333 

.0282 

4.72 

.01 

Belief  Values  Inventory 
Interpersonal  Accuracy  Z 

.1633 

.0213 

7.6,7 

.001 

Belief  Values  Inventory 
Interpersonal  Accuracy 
Variance 

.3900 

.0754 

5.17 

.01 
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Table  3 


Tests  for  Significance  of  Difference  Between 
Individual  Means  for  Each  Judgment  Score 


Judgment  Average 

Score  of  Ind. | 

vs. Best \ 
Judge  1 

.  _  . . .  .  1 

Average 
of  Ind. 
vs.  Grp. 
Cons. 

Average 
of  Ind. 
vs.  Art. 
Grp. 

Best  Judge 
vs.  Grp. 
Consensus 

Best  Judge 
vs.  Art. 
Grp. 

Grp.  Cons, 
vs.  Art. 

Grp. 

Adj.  Check  List 

Total  Diff.  Bet. 

Means  4.39** 

5.25** 

i 

6.05** 

.86 

1 

1.66* 

I 

.80 

Belief  Values 

Inv.  Tot.  Diff. 

Bet.  Means  10.63** 

6.18** 

9.58** 

1 

1 

4.45** 

1 

1 

1.05 

1 

1 

1 

3.40* 

Belief  Values 

Inv.  Stereo.  Acc. 

2  Diff.  Bet. 

Means  .  25** 

.09 

.22** 

.16* 

.03 

.13* 

Belief  Values 

Inv.  Stereo.  Acc. 

Var.  Diff.  Bet. 

Means  . 05 

.04 

.05 

1 

.09** 

.10** 

.01 

1 

Belief  Values 

Inv.  Inter.  Acc. 

Z  Diff.  Bet. 

Means  .11** 

.10** 

.08** 

.01 

.03 

.02 

Belief  Values 

Inv.  Inter.  Acc. 

Var.  Diff.  Bet. 

Means  . 03 

1  .03 

1 

.18** 

.00 

■  1 

1  . 15** 

1 

.15** 

♦Significant  at  .05  level 


♦♦Signif leant  at  .01  level 
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Discussion 

On  each  of  the  four  accuracy  moasuros,  the  “best  judge”  and  both 
group  judgments  are  significantly  superior  to  the  average  of  the 
individuals  composing  the  group.  Thus,  the  major  hypothesis  of  this 
experiment  is  confirmed.  There  ivS  no  consistent  pattern  of  significant 
differences  among  the  first  three  procedures  mentioned  above.  As  would 
be  expected,  on  the  two  scores  representing  the  araoimt  of  variability  in 
predictions,  the  “artificial  group”  mean  tends  to  be  lower  , than  the 
means  of  the  other  three  procedures.  This  tendency  is  significant, 
however,  only  for  the  Interpersonal  Accuracy  variance  score.  It  is 
somewhat  surprising  to  find  tliat  the  ” artificial  group”  is  superior  to 
the  ”best  judge”  on  the  Adjective  Chock  list.  The  interpretation  of 
this  finding  scorns  to  bo  that  if  both  other  judges  disagree  with  the 
“best  judge,”  they  are  moro  likely  to  bo  right  than  is  the  ”bost  judge,” 
If^  on  the  other  hand,  only  one  of  the  other  judges  disagrees  with  the 
”best  judge,”  ho  is  moro  likely  to  bo  wrong  than  is  the  ”best  judge,” 

This  study  clearly  implies  that  satisfactory  ratings  are  least 
likely  to  be  obtained  from  a  single  individual.  In  exploring  further 
implications  of  those  results  for  an  operational  rating  sot  up,  several 
other  considerations  ontcr  in.  The  first  of  these  is  that  typically 
the  ”bc8t  judge”  would  bo  difficult  to  select  on  an  a  priori  basis, 
and  (because  of  selection  orror),  ”bcst  judges”  soloctcd  a  priori  woxild 
probably  score  lower  than  the  ”bcst  judges”  used  in  this  study.  Since 
each  of  the  group  procedures  produces  results  roughly  equivalent  to  the 
"best  judge”  selected  on  an  after  the  fact  basis,  an  extensive  (and 
expensive)  effort  to  identify  best  judges  and  use  thorn  as  raters  would 
appear  to  bo  unnecessary. 
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Tho  second  considoration  involved  in  applying  these  results  is 
that  by  far  the  most  time  in  this  experiment  was  consumed  in  arriving 
at  consensus  judgments  through  group  «^iscusnion,  a  finding  which  one 
would  certainly  expect  g*  '?  to  other  situations.  Since  the 

“artificieT  6^'oup"  procedure  produced  ro^viltB  as  good  as  or  better  than 
4-0.0  results  produced  by  tho  consensus  judgment,  and  required  much  less 
time,  it  would  appear  to  be  most  appropriate  when  accuracy  and  time  are 
both  considered.  Thus,  tho  best  procedure  for  using  ratings  in  many 
applied  situations  would  bo  to  obtain  several  independent  ratings  fl*om 
different  raters  for  each  ratcc,  and  then  combine  those  ratings 
statistically  into  a  single  rating.  It  should  be  noted,  however,  that 
tho  superiority  of  the  ‘‘artificial  group”  in  toms  of  time  required  (and 
therefore  oxponso)  might  disappear  if  only  a  singlo  pummry  rating  were 
required  rather  than  tho  many  relatively  specific  judgments  required  hy 
the  experimental  procedure  used  in  this  study* 

A  limitation  to  these  conclusions  is  the  fact  that  each  rater  in 
this  experiment  was  basing  his  ratings  on  tho  some  or  identical  infor¬ 
mation.,  (i.o,,  seeing  tho  same  movies  of  tho  interviews).  If  different 
raters  are  basing  their  rr. tings  on  different  information,  some  other 
procedure  involving  the  sharing  of  this  information  might  bo  superior. 

In  addition  to  the  practical  implications  outlined  above,  those 
results  present,  in  the  opinion  of  tho  authors,  at  least  two  more 
basic  additions  to  previous  psychological  research.  The  first  of  these 
is  the  demonstration  through  both  tho  Storotype  Accuracy  correlation 
term  and  the  Interpersonal  Accuracy  correlation  term  of  tho  Be  lief - 
Values  Inventory  that  accuracy  is  increased  through  grouping  independent 


of  a  reduction  in  variability  (see  Table  l).  Unlike  the  other  results 
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of  this  oxpQrimont,  this  would  not  nocossarily  bo  oxpoctcd  on  the  basis 
of  previous  studies  comparing  group  and  individual  performance.  The 
second  major  addition  is  related  to  the  current  controversy  in  the 
“interpersonal  perception”  literature  over  the  relative  merits  of 
various  different  typos  of  accuracy  scores  (Cronbach,  1955).  In  the 
current  study  the  total  score  on  the  Adjective  Check  Ilst^  the  total 
score  on  the  Belief-Values  Inventory,  and  the  Sterootvoe  Accuracy  and 
Interpersonal  Accuracy  ccrrolation  terms  all  gave  consistent  results 
and,  more  important,  results  which  make  sense  in  terms  of  previous 
research  comparing  group  and  individual  performance.  This  would  load 
ono  to  hope  that  the  interpretations  of  different  typos  of  accuracy 
scores  have  more  in  ccamnon  than  previous  investigators  have  thought. 
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CH/IPTEH  III 

Components  of  Person  Perception  Scores  and  the 
Clinical  and  Statistical  Prediction  Controversy 

On  the  basis  of  experience  with  the  analysis  of  components  of 
accuracy  scores  reported  in  Chapter  I,  Richards  (i960)  has  recently 
suggested  a  roconccptualization  of  the  clinical  and  statistical 
prediction  controversy.  Briefly,  Richards  suggests  that  previous  studies 
comparing  clinical  and  statistical  prediction  have  been  heavily  loaded 
on  Stereo  typo  Accuracy  ^  and  that  if  comparisons  were  made  on  a  measure 
of  the  Interpersonal  Accuracy  component,  the  results  might  well  favor 
clinical  predictions. 

A  study  reported  in  detail  clscwhoro  (Cline  and  Richards,  I960  B) 
was  conducted  to  tost  this  proposed  roconccptualization.  The  specific 
hypotheses  tested  were  that  on  the  Bcliof-Valuos  Inventory,  statistical 
prediction  is  superior  to  clinical  on  Storootypo  Accuracy,  but  that 
clinical  prediction  is  superior  to  statistical  on  Intorporsonal  Accuracy. 
These  hypotheses  were  tested  using  56  college  student  ’'clinicians,"  who 
wore  tested  in  the  oxporimontal  jTidging  situation  in  which  thoy  mado 
predictions  about  six  standard  persons  presented  by  means  of  sound-color 
movies  of  an  interview.  Tho  data  support  both  hypothosos.  An  incidental 
finding  was  that  those  student  clinicians  differentiated  much  more 
between  Intorviowoos  than  the  accuracy  of  their  differentiations 
Justified. 

In  tho  rosearchors*  opinion,  those  results  help  to  clarify  somo  of 
tho  confusing  issues  raised  by  Moohl  (1954)  and  others,  and  further 
demonstrates  tho  power  and  utility  of  component  accuracy  scores  when 
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those  scores  ciro  both  psychologic'‘.?i.ly  menningfud  and  mothodologloal^ 
sophistocated.  It  is  encouraging  to  note  that  this  study  suggests  that 
clinical  prediction  is  i/cll  suited  to  many  usual  activities  of  typical 
clinicians,  e,g,  rank  ordering  a  group  of  patients  in  terms  of  probablo 
benefit  from  psychothora.py. 
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